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ABSTRACT 

As an important tool for information filtering in the era of 
socialized web, recommender systems have witnessed rapid 
development in the last decade. As benefited from the bet- 
ter interpretability, neighborhood-based collaborative filter- 
ing techniques, such as item-based collaborative filtering 
adopted by Amazon, have gained a great success in many 
practical recommender systems. However, the neighborhood- 
based collaborative filtering method suffers from the rating 
bound problem, i.e., the rating on a target item that this 
method estimates is bounded by the observed ratings of its 
all neighboring items. Therefore, it cannot accurately esti- 
mate the unobserved rating on a target item, if its ground 
truth rating is actually higher (lower) than the highest (low- 
est) rating over all items in its neighborhood. In this paper, 
we address this problem by formalizing rating estimation 
as a task of recovering a scalar rating function. With a 
linearity assumption, we infer all the ratings by optimiz- 
ing the low-order norm, e.g., the Zi-norm, of the second 
derivative of the target scalar function, while remaining its 
observed ratings unchanged. Experimental results on three 
real datasets, namely Douban, Goodreads and MovieLens, 
demonstrate that the proposed approach can well overcome 
the rating bound problem. Particularly, it can significantly 
improve the accuracy of rating estimation by 37% than the 
conventional neighborhood-based methods. 

Categories and Subject Descriptors 

H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: 
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1. INTRODUCTION 

With the explosion of web information in the last decade, 
it becomes more and more difficult for individuals to dis- 
cover interesting information from massive web resources. 
To solve the information overload problem that has attracted 
lots of attention from both academic and industrial commu- 
nities, various personalized recommender systems have been 
developed to help a user automatically find their interested 
information. Mainstream approaches include kNN collabo- 
rative filtering [5], latent factor models [14, 7, 15], resource 
projection [33], restricted Boltzmann machine [24], etc. Al- 
though in recent years recommender systems play a more 
and more important role in online commerce and sharing 
services such as Amazon, Netflix and YouTube, there still 
exists much room for a recommender system to improve its 
accuracy. A bunch of problems including sparsity, cold start 
and diversity remain as grand challenges in the open litera- 
ture [f, 20]. 

The two major schools, neighborhood-based methods and 
latent factor models, are facing their own difficulties respec- 
tively. The family of latent factor models, originating from 
matrix factorization and making great progress by incor- 
porating with probabilistic graphical models [23] and com- 
pressed sensing [4] recently, gains success in accurately com- 
pleting missing ratings. However, its lack of interpretability 
may limit its application in practice, since interpretability 
plays a critical role in practice to affect users' experience 
[27]. The conventional neighborhood-based approaches like 
kNN collaborative filtering techniques produce estimations 
in a way much easier to explain clearly, but meanwhile suffer 



from the rating bound problem. Take the item-based collabo- 
rative filtering as example. A neighborhood-based approach 
estimates an unobserved rating with the weighted average of 
ratings on similar items called neighbors, and therefore the 
estimation is bounded by observed neighbor's ratings. How- 
ever, an item a user loves or hates the most is usually rated 
higher or lower than all the neighbor items, and therefore 
has no chance to be correctly predicted by a neighborhood- 
based approach. Similarly, the rating bound problem is also 
a big challenge to the user-based collaborative filtering tech- 
nique. 

The rating bound problem will not be problematic if the 
actual bounds are fully observed, i.e., the items rated higher/lower 
than all other items at least in a local range. Unfortunately, 
in many practical scenarios, those ratings are not always 
fully observed, since, for example, a movie fan is lazy to la- 
bel all her favorite movies on each of a dozen movie websites 
she registers on, or a reader is reluctant to post a rating to 
an extremely boring book. Lacking observations of actual 
rating bounds, a neighborhood-based approach like kNN col- 
laborative filtering technique seriously suffers from the rat- 
ing bound problem because of its results being bounded by 
incorrect bounds. By empirical analysis, up to 15% estima- 
tion tasks suffer from the rating bound problem. Nearly a 
half of estimation errors of kNN collaborative filtering owe 
to incorrect estimations on those items. This implies that it 
is a much more difficult job to accurately recover an unob- 
served rating with the rating bound problem than recovering 
an unobserved rating without the problem. 

Besides the help to accurately recover missing ratings, the 
items which are rated higher or lower than all other items 
have their own values. Such items are called a user's in- 
terest centers. A positive interest center of a user, i.e., an 
item she rater higher than all neighbor items, is an item 
she loves the most, such as five favorite restaurants in the 
city or most beloved movies in a library. To correctly dis- 
cover and recommend those items (if unseen yet) makes the 
user enjoyable and trust the system. Symmetrically, a user 
might have several negative interest centers that she dislikes 
or hates, e.g., a soft music fan might be unhappy to find a 
heavy metal rock album in a recommendation list. A recom- 
mender system should try its best to avoid recommending 
those disliked items. 

In a word, it is critical for a recommender system to solve 
the rating bound problem, not only for more accurately pre- 
dicting the unobserved ratings, but also for better sketching 
a user's interest map and improving user experience. Latent 
factor models might be helpful to reduce the pain caused by 
the rating bound problem. Nevertheless, due to the practi- 
cal importance to explain to users why the recommendation 
list is produces, as well as the difficulty in explaining matrix 
factorization results, we attempt to solve the problem in the 
line of neighborhood-based methods, which provides more 
explainable results than a latent factor model. 

To address the rating bound problem, we view the task to 
estimate unobserved ratings in a recommender system as a 
job of function recovery. Given an item-item network built in 
the same way as in a standard neighborhood-based method, 
for each user u, a scalar function r u (-) is defined on the net- 
work to map any item to a rating. A recommender system 
is required to recover the whole function r u { ) as accurate 
as possible, based on the partial observation of the function 
value on a few items. With a practically verified prior knowl- 
edge that such a scalar function is linear on most items, i.e. 
its second derivative vanishes on most items, we develop an 
effective method to recover the function by minimizing the 



number of items with non-zero second derivatives (for sim- 
plicity, we denote sources for items with non-zero values of 
the second derivative of a scalar function in the rest of the 
paper). Empirical practice supports that our approach ef- 
fectively improves the performance when predicting items 
with the rating bound problem. 

The major contributions in this paper are listed below, 

• We study the rating bound problem that the conven- 
tional neighborhood-based approaches suffer from. 

• To solve the problem, we introduce a scalar function 
view to consider a recommender system algorithm as 
a scalar function recovery task based on partial obser- 
vations. 

• We propose an approach that minimizes the Zj_-norm 

2 

of the second derivative of a scalar function to recover 
it. The approach is validated effective with empirical 
experiments. 

The rest of the paper is organized as follows. Section 2 in- 
troduces our view to the recommender systems and Section 
3 describes our approach to solve the rating bound prob- 
lem, which is validated in Section 4. Section 5 reviews the 
recent progress in recommender system research. Section 6 
concludes the paper. 

2. SCALAR FUNCTIONS RECOVERY 

In this section we introduce our view of recommender sys- 
tems as a task of scalar functions recovery, and how different 
approaches leverage a property of the functions for inference. 

From a function perspective, the process to complete miss- 
ing ratings in a recommender system can be considered as 
a job of function recovery. For each user u, a scalar func- 
tion r u (-) is defined to map any item i (e.g, book, music, 
movie, product, celebrity, etc.) to a real or integer rating 
r u (i). A recommender system is expected to recover the 
whole function r u { ) for each user based on partial obser- 
vation of the function value. Obviously this is impossible 
unless prior knowledge or additional evidence is provided. 

Different prior knowledge or assumptions result in differ- 
ent approaches. For example, latent factor models assume 
that the explicit form of a scalar function is a linear combi- 
nation. Each item is represented with a vector where each el- 
ement corresponds to its "quality score" in a certain feature, 
and a set of "interest" weights is defined for each user to add 
up those scores to make a rating. Differently, neighborhood- 
based approaches do not assume the explicit for the a scalar 
function, but instead assume that the shape of the concerned 
function is "smooth" on an item-item network, and therefore 
can be fulfilled with interpolation. In the scope of this pa- 
per we extend the line of neighborhood-based approaches 
because of its ease in explanation. 

A widely believed assumption, usually called the similar- 
ity assumption, tells that if two items were rated similarly 
in the past, they will be rated similarly in the future. The 
assumption provides the basis to build the collaborative fil- 
tering technique, which estimates an unobserved rating with 
the weighted average of ratings on similar items. An equiv- 
alent description to the assumption tells the linearity of a 
scalar function defined on an item-item network where nodes 
are items and edges describe the similarity among items 1 . 

1 In the scope of this paper we discuss the item-based col- 
laborative filtering. A similar conclusion to its user-based 
version follows by symmetry. 



We introduce a linearity assumption of a scalar function that 
its second derivative vanishes on most items, i.e., the follow- 
ing equation holds for most items i, 



V 2 r t 



(i) = 0, 



(1) 



where V 2 denotes a discrete second derivative operator. The 
linearity assumption helps us recover the whole function af- 
ter observing part of function values. 

2.1 Equivalence explanation 

We first explain why the similarity assumption and the 
linearity assumption are equivalent. Let us recall the well- 
known Resnick equation applied in kNN collaborative filter- 
ing to predict an unknown rating r u (i) as follows 2 , 
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where w(i,j) is the weight on the edge between item i and 
item j, i.e., the similarity between them. The weighted av- 
erage is calculated among items in N(i), the neighbor set of 
item i. We move the RHS to the left and have 
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, n, we can rewrite 



Labeling all items with integers 1, 2, 
the above equation in a matrix form, 

(J — D~ 1 W)R V . = 0, 
where Ftu is a vector consisting of r u (-) values, 

In xii is an 

identity matrix, W n xn is the weight matrix with element 
Wij = w(i,j), and the diagonal matrix D n xn is defined as 



Dii = 



i^j. 



Notice that the Laplacian matrix L — I — D~ 1 W is the 
negative of V 2 , which is the discrete second derivative opera- 
tor. The above equation is exactly an second-order ordinary 
differential equation 

LRu = -V 2 R„ = 0, 

as we mentioned in Equation (1). The connection was firstly 
introduced in [31]. 

2.2 Examining the linearity assumption 

To support the linearity assumption, we empirically exam- 
ine whether the second derivative of any user's rating func- 
tion vanishes on most items. In three real datasets (dataset 
details described in Section 4.1), we collect such examples 
where the second derivative V 2 r u (i) can be directly calcu- 
lated, i.e., the ratings a user u posts to an item i and all 

2 In the scope of this paper we use Equation (2) to de- 
scribe kNN collaborative filtering. There are several forms 
of kNN collaborative filtering, for example r u (i) = f(i) + 

— — — W j~- tt , where r(i) denotes the global aver- 

age rating of item i. The discussion in this paper can be 
easily extended to the above form, with a simple preprocess 
to replace all r u (j) with r u (j) — f(j). Extensions to other 
forms are similar. 
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Figure Is Number of examples that a user's rating 
on an item drifts away from an estimator which is 
the weighted average among neighbor items. Exam- 
ples are not presented if the concerned item has less 
than 5 neighbors or less than 90% neighbors rated 
by the same user. In a major of examples the drift 
is quite small, supporting the linearity assumption 
that the second derivative of any user's rating func- 
tion vanishes on most items. 



its neighbor items in N(i) are completely observed 3 . We 
further require that no less than 5 ratings are observed on 
neighbor items, otherwise the calculation of V 2 r u (i) might 
be unreliable. 

In each dataset, there exist hundreds of user-item pairs 
where the second derivative can be calculated according 
to observed ratings. The statistics of the obtained second 
derivatives are report in Fig. 1. As shown in the figure, 
on most examples the calculated values of second derivative 
are almost zero, and few examples have a second deriva- 
tive far away from zero. The number of those examples 
decreases exponentially with the distance from zero (Note 
that the vertical axis is labeled in logarithm). The results 
confirm the linearity assumption on the shape of a user's 
rating function, which later helps us accurately recover the 
whole rating function based on an observed part of ratings. 

2.3 Leveraging the linearity assumption 

As discussed earlier, the kNN collaborative filtering tech- 
nique (denoted as kNN in the rest of the paper) leverages 
the linearity assumption to estimate an unobserved rating. 
However, it actually calculates the second derivative with 
observed ratings of neighbors only, instead of the observed 
ratings and estimated rating of all the neighbors, as 



f u (i) 
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where A(u) denotes the set of items that user u has rated. 
The above calculation is an approximation, and could be 
unreliable when the observations are sparse, which is quite 
common in a typical recommender system. An example is 

3 Practically, since the fully observed examples are too few 
to build a solid statistics conclusion, we search for examples 
r u (i) such that the user u rates the item i and no less than 
90% neighbor items 



shown in Figure 2 to demonstrate. In the example network 
consisting of 4 items, the left two items in grey are observed 
to be rated 5-star and 3-star respectively, while the right two 
ones in white wait predicting. As shown in Figure 2(a), kNN 
collaborative filtering considers Equation (f) holding on un- 
rated items as drawn in circles. Each unobserved rating is 
estimated with observed neighbors. The upper right item 
has only one neighbor observed, the upper left item, and 
therefore its estimation simply equals 5-star rating. The 
estimation is bounded by the 5-star rating on its observed 
neighbor. 

The heat conduction process [31] (denoted as HCP in the 
rest of the paper) points out the shortcoming of kNN ap- 
proach and improves the kNN approach by calculating the 
second derivative with all neighbor ratings, no matter ob- 
served or unobserved. In order to break the dilemma that a 
pair of unobserved neighboring items wait for each other to 
complete the calculation first, the approach simultaneously 
estimate all unobserved ratings by solving a linear system 
consisting of Equation (1) on all unobserved items. However, 
the requirement that Equation (1) holds on all unobserved 
items is a strong assumption and therefore limits its perfor- 
mance. 

As shown in Figure 2(b), HCP also considers Equation 
(1) holding on unrated items as drawn in circles. Different 
from kNN, an unobserved rating is estimated with observed 
and unobserved neighbors. The predicted rating of upper 
right item is calculated with its two neighbors, the observed 
upper left item and the unobserved lower right item. Solving 
Equation (1) on the right two items simultaneously results 
in 4. 3-star and 3.7-star ratings on them respectively. Both 
estimators are bounded in the range of [3, 5]. 

In our view of scalar function recovery, in order to es- 
timate the second derivative as accurately as possible, the 
second derivative is also calculated using all neighbor rat- 
ings, no matter observed or unobserved. Similarly, the re- 
covery process also solves a bunch of instances of Equation 
(1) simultaneously to avoid the dilemma of mutually wait- 
ing. Different from HCP, we expect Equation (1) hold on 
most items, no matter observed or unobserved, instead of 
all unobserved items. The distinct benefit of our method is 
explained as follows. 

Since an item with non-zero value of second derivative is 
probably rated higher than neighbor items (local maxima of 
a rating function) or lower than neighbor items (local min- 
ima), it might represent the point a user loves or hates the 
most in a local range of dozens of items. The rating on 
such an item could be unobserved due to many reasons. For 
example, the user is lazy to label her favorite movie on a 
website since she registers for a dozen movie websites, or 
a user is reluctant to post a rating to an extremely boring 
book, etc. Allowing Equation (1) not to hold on an unob- 
served item, we keep the possibility to view the unobserved 
item as a local favorite or dislike. Symmetrically, since an 
unobserved item might be a local maximum or minimum, it 
is also possible that an observed item is not a local maxi- 
mum or minimum, and furthermore it has a chance to have 
a zero valued second derivative. Yet we do not exclude the 
possibility that Equation (1) holds on an observed item. 

As shown in Figure 2(c), SFR considers Equation (1) hold- 
ing on some items. For example we take the upper left and 
lower right items as sources, drawn in boxes. The remaining 
two items, drawn in circles, are not considered as sources 
and are expected to satisfy Equation (1). The missing rat- 
ings are predicted by simultaneously solving Equation (1) 
on the two non-source items, whose second derivatives are 
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Figure 2: Different leverage of the linearity assump- 
tion by three approaches. 4 items make an item- 
item network. The left two items are observed to 
be rated 5-star and 3-star respectively, as drawn in 
grey. The right two items wait predicting, as drawn 
in white. Different approaches consider those items 
as sources (drawn in boxes) or non-sources (drawn 
in circles), where a non-source item is expected to 
satisfy the linearity assumption. A solid directed 
edge indicates that the rating on the tail is collected 
to calculate the second derivative of the rating on 
the head, while a dash arrow indicates the rating 
on the tail is not involved in the calculation on the 
head, (a) The kNN collaborative filtering calculates 
the second derivative on unrated items only, and the 
calculation is based on observed neighbors, (b) HCP 
calculates the second derivative on unrated items 
only, and the calculation is based on all neighbors, 
(c) SFR calculates the second derivative on rated 
and unrated items except a few sources, and the 
calculation is based on all neighbors. SFR makes 
use of much more neighborhood relations than kNN 
and HCP. 



calculated with both observed and unobserved neighbors, as 
the arrows indicate. The results are 3-star and 1-star re- 
spectively, not bounded in the range of observed ratings. 

In the demo all approaches predict with two instances of 
Equation (1). In actual, since our approach seeks for a solu- 
tion minimizing the number of sources (discussed later) , it is 
expected that the number of sources might be even smaller 
than the number of observed ones, and therefore the equa- 
tions our approach uses might be more than the equations 
kNN and HCP make use of. More equations could provide 
more evidence to accurately recover a scalar function. 



Table 1: Different leverage of the linearity assump- 
tion by three approaches. 





Calculates with 


Calculates for 


kNN 


Observed neighbors 


All unobserved ratings 


HCP 


All neighbors 


All unobserved ratings 


SFR 


All neighbors 


Most items (observed or 
unobserved) 



2.4 Connection to the rating bound problem 

Let us analyze the three views, kNN, HCP and our scalar 
function recovery, from the perspective of the rating bound 
problem. 

kNN: locally bounded. A kNN prediction of an unob- 
served rating is bounded by the ratings of observed neighbor 
items, since kNN estimates an unobserved rating with the 
weighted average of ratings among observed neighbors. For 
example, if an unobserved rating has all neighbor items ob- 
served with 1-star or 2-star ratings by the same user, it has 



no chance to be predicted with a 3-star rating. Therefore 
kNN can never correctly predict the rating on that item if 
it happens to be the user's most favorite movie. 

HCP: globally bounded. An HCP prediction of an 
unobserved rating is not bounded by neighbor items, since 
it calculates the weighted average among all neighbors no 
matter observed or unobserved. However, in the HCP cal- 
culation it is required that the second derivatives vanish on 
all unobserved items. Therefore only observed items can 
have non-zero values of second derivative. As discussed ear- 
lier, local minima and maxima must have non-zero values of 
second derivative. As a result, all predictions have a global 
bound, i.e., they are bounded within the range between the 
lowest rating and the highest rating among observed ratings. 
Take the same example, even though an unobserved rating 
has all neighbor items observed with 1-star or 2-star ratings 
by the same user, it still has a chance to be predicted with a 
3-star rating as long as she has ever posted a 4-star rating on 
any item. The HCP approach improves kNN by expanding 
the bounds from local range to global range. However, it 
still has no chance to receive a 5-star prediction if the user 
has not posted a rating higher than 4-star. 

SFR: not bounded. In our approach, prediction on 
an unobserved rating is not bounded. Since the linearity 
assumption does not require all unobserved ratings satisfy 
Equation (1), a prediction might be higher/lower than all 
neighboring and far-away observations. Besides, we also al- 
low an observed rating satisfying Equation (1) sometimes. 
Even the highest observed rating also has a chance to sat- 
isfy (1), i.e., at least one prediction among its neighbors is 
assigned a higher rating than it and thus breaks the rating 
bound. Therefore our approach does not suffer from the rat- 
ing bound problem and thus has the potential to gain better 
performance on prediction, especially on items whose ground 
truth is indeed higher/lower than all observed ratings. 

3. INFERENCE 

In this section we introduce our approach to recover a 
scalar function with the property of its second derivative. 

The scalar function recovery process is formulated as the 
following inference job. Given a user it, we observe the value 
of her scalar function r u {-) on a set of items A(u) that she 
has posted ratings, and are required to estimate the value of 
r u (-) on other items. We denote f u (i) for an observed rating 
that user u posts on item i, and f u (i) for an estimated rating. 
To complete the job, we search for a feasible solution with 
a minimal number of sources as follows, 

R u = argmin ||V 2 7? u ||o 

s.t.Ru,i = r u (i),\/ieO(u) ( 4 ) 
Ru,x e [ci,c h ],\/u,i 

where R u is again a vector consisting of r u (-) values. Since 
our approach does not bound prediction range with obser- 
vations, one needs to specify the legal range [q, c h ] a rating 
is allowed to be. A typical recommender system like Movie- 
Lens and Netflix requires a legal rating be between [1,5], 
and in Yahoo! music the range is [0, 100]. The Zo-norm of 
V 2 R U , which is equivalent with V r u , counts the number 
of sources. The solution comes by minimizing the number 
of sources under the hard constraint of observations within 
the predefined boundary. Unfortunately, the Zo-norm of a 
vector is difficult to minimize since it has no explicit form 
of gradient. For computational ease, we replace problem (4) 
with a slightly different form as follows 



R u = arg min | V 2 R u | | P 

s.t.R u , i = r u (i),\/ieO(u) ( 5 ) 
Ru,x G [q, Ch], Vw,i 

where < p < 1 is a parameter and || • || p denotes Z p -norm 
of a vector w as follows 

IHIp = (EW p )*< 

k 

where Wk is the k th element in w. Since it is convenient 
to calculate the gradient of the Z p -norm of a vector, we can 
easily apply standard optimization algorithms such as gra- 
dient descent to find the solution. The gradient of objective 
function is 

v l \\v 2 Ru\\ P = C£\^ 2 R^) k \ p )^ 1 Ljv, 

k 

where V< indicates we only calculate gradient for unobserved 
ratings R u ,i,i ^ 0(u), (V 2 R u )k is the k th element in the 
vector V 2 -R u , Li consists of column vectors in the Laplacian 
matrix corresponding to unobserved items, and Lj is its 
transpose, v is a vector of the same size as \7 2 R U and each 
of its element Wfc = |(V 2 Ru)^^ 1 sign((X7 2 R u ) k ). In practice 
we set p = | in later experiments. In Section 3.4, we will 
discuss the selection of p. 

3.1 Sources of a scalar function 

Why do we solve the scalar function recovery task by 
minimizing the number of sources? A scalar function on 
a finite-sized network with all-zero second derivative must 
be a constant function, which is obviously not practical in 
a real recommender system. Therefore for any user, there 
must exist some items with non-zero values of the second 
derivative of her scalar function. We claim there should 
only exist as few sources as possible, and the function can 
be recovered by minimizing the number of those items. This 
claim is reasonable for the following two reasons. 

Intuitively, every user has several favorite movies or books 
that she rates higher than neighbor items (local maxima of 
a rating function). Symmetrically she might also have sev- 
eral dislikes that she rates lower than neighbor items (local 
minima) . Since her ratings on those items have no chance to 
be equal to the weighted average among neighbor items, the 
second derivative cannot vanish on those items. Therefore, 
the sources provide a superset of the items she loves or hates 
the most. By minimizing the number of sources, we lower 
the number of favorites and dislikes in an estimated interest 
distribution, capturing the intuition that it is quite unlikely 
a user has thousands of "favorite" movies. 

Mathematically, if we view the scalar function as a scalar 
field defined on an item-item network, the second derivative 
is the divergence field of its gradient field, and the sources are 
points where the scalar function changes its gradient. The 
more frequently a scalar function changes its gradient, the 
more complex it could be. The number of sources could thus 
be considered a metric of function complexity or structural 
risk. Given partial observation of a scalar function, there 
might exist infinitely many feasible solutions that fit the 
observations. Thus minimizing the number of sources helps 
us to find a solution with the minimal risk of over-fitting 
among all feasible ones. 



3.2 Case study on toy data 

We build an artificial dataset to demonstrate how our ap- 
proach works. As shown in Figure 3(a), the dataset contains 
a user's rating on 26 items. The items, represented by cir- 
cles and boxes, are connected with edges representing item 
relation (e.g., similarity between two items) with uniform 
weights. Ratings are labeled on items. The items she loves 
and dislikes the most locate at the top and bottom respec- 
tively, playing the role of sources (shown in boxes). Other 
items are assigned with proper ratings according to the lin- 
earity assumption (shown in circles). Ratings on 8 items are 
observed (labeled in grey), and ratings on other items are 
waiting estimation (labeled in white). An accurately esti- 
mated rating will be labeled in green, while an inaccurate 
estimation will be labeled in orange. 

We firstly run kNN and HOP on the toy dataset. kNN 
considers the observed ratings as sources and estimates each 
unobserved rating with the weighted average of ratings on 
its observed neighbors. As shown in Figure 3(b), kNN es- 
timates the unobserved ratings with 12 edges. Each unob- 
served rating has at most one neighbor observed, thus the 
average among observed neighbors provides an inaccurate 
estimator of the average among all neighbors. As a result, 
kNN inaccurately estimates all unobserved ratings. Besides, 
kNN fails to present estimations to unobserved items with 
no neighbors observed, labeled with question marks. 

HCP also considers the observed ratings as sources and 
estimates unobserved ratings by requiring the linearity as- 
sumption satisfied on all unobserved items. As shown in 
Figure 3(c), HCP estimates the unobserved ratings with 48 
edges. It outperforms kNN by successfully estimating unob- 
served ratings of the 4 items in the middle, whose neighbors 
are either observed or easy to correctly estimate. However, 
due to the rating bound problem, HCP cannot accurately 
estimate the unobserved ratings of the 7 items near the top, 
whose real ratings are higher than all observations. The 
similar problem occurs on the 7 items near the bottom. 

Our approach does not assume the observed ratings are 
sources. Instead, it seeks a solution to (approximately) mini- 
mize the number of unknown sources. The seeking converges 
to a solution that (correctly) takes the top and bottom items 
as two sources. It leverages 60 edges to calculate the unob- 
served ratings, and accurately estimates all unobserved rat- 
ings as shown in Figure 3(d). Although the observed ratings 
are narrowed in the range of [4, 7] , our approach does not 
suffer from the rating bound problem and accurately recov- 
ers the ratings out of the range, especially ratings on the 
two sources. 




(c) HCP (d) SFR 



3.3 Connection with conventional approaches 

The conventional approaches such as kNN and HCP could 
be considered as special cases of our approach. 

The kNN approach is a special case of our approach with 
two additional assumptions. First, the second derivative 
vanishes on all unobserved items. Compared with our linear- 
ity assumption that the second derivative vanishes on most 
items no matter observed or unobserved, the assumption of 
kNN is obviously stronger and therefore its prediction would 
be a subset of our feasible solution set. Second, calculation 
with partially observed neighbors provides a good estima- 
tor of the second derivative. In kNN, the weighted average 
among neighbors is not calculated with all neighbors, but in- 
stead with observed neighbors only. Such a way implies the 
belief that the estimator based on incomplete information 
provides an accurate approximation. 

HCP extends the kNN by removing the second assump- 



Figure 3: (Color online) Toy data demonstrating 
three approaches. Each node represents an item. 
An item shown in a box means that it is a source 
(or considered as a source), while an item shown in 
a circle means that it is not a source (or not con- 
sidered as a source). A user's ratings are labeled on 
items. An observed rating is labeled in grey, while 
an unobserved rating is labeled in white. An ac- 
curately estimated rating is labeled in green, while 
an inaccurate estimation is labeled in orange. Each 
edge represents a relation between two items. An 
undirected edge in (a) indicates two items are re- 
lated. A solid directed edge in (b)(c)(d) indicates 
that the edge is leveraged. Specifically, the rating 
on the tail is used to calculate the second deriva- 
tive of the rating on the head. A dash directed edge 
indicates that the edge is not leveraged. 



tion. It requires the weighted average calculated with all 
neighbors, observed and unobserved, in order to achieve a 
more accurate estimation. However, it keeps the first as- 
sumption that requires the second derivative vanish on all 
unobserved items, which makes its feasible solution set also 
a subset of our feasible solution set. 

To summarize, the two approaches kNN and HCP are 
built on assumptions stronger than ours, which makes them 
two special cases of our approach. Since their feasible solu- 
tion sets are subsets of our feasible solution set, our approach 
has the potential to achieve better performance. 

3.4 The selection of parameter p 

The selection of p is an interesting topic when optimiz- 
ing Equation (5) to recover unobserved ratings. Different p 
values might lead to different prediction results. The orig- 
inal Equation (4) corresponds to the case p = 0, which is 
straightforward but difficult to solve explicitly. It provides 
an accurate solution if the network is small enough for an 
exhaustive search, and unfortunately fails to solve stably 
within an acceptable time period when the problem size be- 
comes larger, p £ (0, 1) provides be a good approximation 
to the p = case but is much easier to optimize with an 
explicit form of gradient, and that is why we choose p = \ 
in our practice. When p > 1, the objective function does 
not lead to a sparse solution with very few sources. It is 
an open question that how the selection of p influences the 
approximation. 

4. EXPERIMENTAL VALIDATION 

In this section we take experiments to study the rating 
bound problem and evaluate performance of different ap- 
proaches on the problem. 

4.1 Data collection 

We empirically evaluate our approach on three datasets: 
Douban, Goodreads and MovieLens. Douban and Goodreads 
datasets are crawled from two online collection websites 4 
where users rate millions of movies, books and music. Af- 
ter crawled the data, we removed inactive users since their 
inactivity may lead to unreliable statistics. The MovieLens 
dataset is a benchmark dataset in the latest decade. Statis- 
tics of those datasets are reported in Table 2. 



Table 2: Datasets description 





Douban 


Goodreads 


MovieLens 


Number of users 


32, 384 


32, 907 


6,040 


Number of items 


14, 923 


13, 548 


3, 706 


Number of ratings 


345, 293 


168, 926 


1,000,209 



For each dataset, 80% examples (ratings) are taken as a 
training set, and the rest 20% as a testing set. We calcu- 
late Pearson's correlation coefficient between ratings in the 
training set on each pair of items, and build an item-item 
network in which two items are linked with a weight equal 
to their correlation coefficient if the correlation is above 0.2 
(Douban and Goodreads) or 0.5 (MovieLens). 

4.2 Studying the rating bound problem 

In order to demonstrate how the rating bound problem 
is critical to a recommender system, we count the exam- 
ples suffering from the rating bound problem, i.e., where a 



rating is either higher or lower than all observed ratings on 
its neighbor items by the same user. Besides, we run kNN 
approach to predict the ratings in the testing set in order 
to examine to what extent those examples contribute to the 
prediction performance. 

As reported in Table 3, up to 15% examples in the testing 
set of each dataset suffering from the rating bound problem. 
The real ratings on those examples are either higher or lower 
than all neighbor ratings in the training set, which means 
the kNN approach has no chance to correctly predict those 
ratings. Not surprisingly, those examples make up about 
44% of the prediction error among all testing examples , in- 
dicating that it is much more difficult to correctly predict an 
example with rating bound problem than an example with- 
out the problem. Therefore the ability to solve the rating 
bound problem is critical to evaluate the performance of a 
recommender system. 



Table 3: The number of examples suffering from 
rating bound problem in the testing set, as well as 
their contribution to the prediction error (measured 
in squared residual) calculated with a standard kNN 
approach. 





Rating higher 
than neighbors 


Rating lower 
than neighbors 


Count 


Error 

contribution 


Count 


Error 

contribution 


Douban 


7.17% 


18.28% 


8.40% 


25.57% 


Goodreads 


6.55% 


17.66% 


9.12% 


27.02% 


MovieLens 


11.43% 


30.77% 


6.08% 


14.12% 



4.3 Empirical results 

We test our approach on real datasets to examine its abil- 
ity to solve the rating bound problem, compared with kNN 
and HCP as baselines. Based on ratings in the training set 
and an item-item network, all approaches are tested to pre- 
dict the ratings in the testing set which encounter the rating 
bound problem, i.e., a ground truth rating in the testing set 
is higher or lower than all neighbor ratings in the training 
set. The prediction results are evaluated with RMSE (root- 
mean-square error) and reported in Table 4, where "Higher" 
and "Lower" means the examples whose ratings are higher 
or lower than all observed neighbors respectively, and "All" 
means their combination. Our approach shows consistently 
better ability to solve the rating bound problem on different 
datasets, with a reduction on prediction error by up to 37% 
compared with kNN. The improvement is even more signifi- 
cant on items whose ratings are higher than observed neigh- 
bors. Since an item rated higher than neighbors is probably 
a user's positive interest center, our approach is expected to 
achieve much better user experience by accurately discover 
a user's favorites. 

Furthermore, to analyze the approach performance on het- 
erogeneous items, we classify the examples with the rating 
bound problem into subsets according to their real ratings, 
and report the prediction error on each subset respectively 
in Figure 4. In the "higher" samples, both HCP and our 
approach largely reduce the prediction error compared with 



www.douban.com and www.goodreads.com 



5 The prediction error is measured with squared residual 
rather than the standard evaluation metric RMSE (root- 
mean-square error) , since it is not straightforward to declare 
the percentage that a part of examples contribute under a 
root calculation. 



lkNN: RMSE=1.458 
□ HCP:RMSE=1.086 
1SFR: RMSE=1.086 



Ground truth rating 



(a) Douban (higher) 



1 kNN: RMSE=1.524 
□ HCP: RMSE=1.225 
1SFR: RMSE=1.21 




Ground truth rating 



(b) Goodreads (higher) 



lkNN:RMSE=1.895 

HHCP:RMSE=1.204 
|SFR:RMSE=1.185 




Ground truth rating 



(c) MovieLens (higher) 



I kNN: RMSE=1.594 
HHCP: RMSE=1.437 
ISFR: RMSE=1.408 



Ground truth rating 



(d) Douban (lower) 




1 kNN: RMSE=1.597 
HHCP: RMSE=1.521 
ISFR: RMSE=1.45 



Ground truth rating 



(e) Goodreads (lower) 



I kNN: RMSE=1.76 
HHCP: RMSE=1.472 
|SFR: RMSE=1.397 



Ground truth rating 



(f) MovieLens (lower) 



Figure 4: (Color online) Prediction accuracy on items which suffer from rating bound problem, measured by 
RMSE. Three approaches are examined to predict unobserved ratings in the testing set where the ground 
truth ratings are higher (top figures) or lower (bottom figures) than all neighbor observations in the training 
set. Our scalar function recovery approach proves to effectively reduced the prediction error in those tasks, 
especially when the ground truth rating is lower than all neighbor observations. 



Table 4: Evaluating prediction error with RMSE on 
examples encountering the rating bound problem. 
Our approach SFR consistently achieves the lowest 
prediction error in different datasets and different 
samples, with a reduction up to 37% compared with 
kNN collaborative filtering. 



Douban 




kNN 


HCP 


SFR 


All 


1.532 


1.287 


1.269 


Higher 


1.458 


1.086 


1.086 


Lower 


1.594 


1.437 


1.408 


Gooc 


reads 




kNN 


HCP 


SFR 


All 


1.566 


1.404 


1.355 


Higher 


1.524 


1.225 


1.211 


Lower 


1.597 


1.521 


1.450 


MovieLens 




kNN 


HCP 


SFR 


All 


1.849 


1.303 


1.263 


Higher 


1.895 


1.204 


1.185 


Lower 


1.760 


1.472 


1.397 



kNN, and our approach outperforms HCP slightly. The re- 
duction is almost the same in different subsets, supporting 
that the advantage is consistent. The only exception is that 



our approach fails to outperform kNN in the subset of 2-star 
rated items (higher than observed neighbors) in Goodreads 
dataset, where the subset contains only 6 examples and is 
not statistically significant. In the "lower" samples, the im- 
provement is not so large but our approach still outperforms 
two baselines in most cases. The improvement is more sig- 
nificant in high rated examples. 

The worst performance of kNN attributes to the way it es- 
timates the second derivative with partially observed neigh- 
bor ratings, while the significant improvement by HCP and 
our approach reveals the necessity to estimate the second 
derivative with all neighbors. Besides, allowing an unob- 
served rating not necessarily satisfying Equation (1) makes 
it possible for our approach to discover uncovered interest 
centers for each user, which results in the reduction of pre- 
diction error our approach achieves compared with HCP pre- 
dictions. To summarize, the empirical results support our 
claim that our approach has the ability to gain better pre- 
diction when encountering the rating bound problem, and 
therefore might be more applicable to discover a user's in- 
terest centers. 

5. RELATED WORKS 

Personalized recommender systems have been a hot topic 
in the research literature for a decade [1, 20, 16]. 

One of the earliest personalized approaches is named content- 
based method, which builds a profile for each user and each 
book with a vector of weights on different words. The ap- 



proach estimates an unobserved rating with the dot product 
of a user vector and a book vector [2]. Although the ap- 
proach is seldom used and soon replaced by collaborative 
filtering methods because its usage is limited in scenarios 
where an item can be explicitly parsed like a book, the form 
of dot product of a user vector and an item vector leaves the 
possibility that latent factor models arise. 

Due to the advantage that a content-based method solves 
half of the cold start problem by building a profile for each 
incoming book, recently researchers are seeking for an vari- 
ation of it to reduce the pain of cold start. Tag-based 
recommender systems are then developed to leverage user- 
generated tags to represent an item like videos or music 
which was difficult to explicitly parse [32, 26]. 

kNN collaborative filtering techniques are developed to 
solve the problem left by content-based methods that items 
are not explicitly parsable. The series of approaches are built 
on the so-called similarity assumption that if two users be- 
have similarly in the past they will behave similarly in the 
future [19]. A user-based collaborative filtering technique 
calculates the similarity between two users with Pearson's 
correlation coefficients and estimates a user's unobserved 
rating on a certain item with the weighted average rating 
that similar users post on that item. An item-based version 
calculates item similarity and predict in a symmetric way 
[5, 25, 6]. Recently researchers attempt to combine the two 
versions by simultaneously consider user-user and item-item 
similarities, and claim to gain better accuracy [30]. Point- 
ing out the shortcoming that a kNN collaborative filtering 
technique calculates the weighted average among observed 
neighbor ratings which may lead to inconsistency, Zhang 
et al. propose the heat conduction process to estimate an 
unobserved rating with the weighted average rating of all 
neighbor ratings, no matter observed or unobserved. By si- 
multaneously solving all unobserved ratings with a Green 
function, the method outperforms conventional kNN collab- 
orative filtering methods in prediction accuracy [31]. A re- 
cent research work seeks for a unique set of global neighbors 
to be shared with all users and attempts to minimize the set 
size to achieve better accuracy and coverage [3]. 

Latent factor models represent a user and an item with a 
vector in a latent feature space, and estimate a rating with 
the dot product of a user vector and an item vector [7]. 
Different from standard matrix factorization tasks that the 
whole target matrix is observed, in a recommender system 
only partial entries in the rating matrix are observed, and 
therefore controlling the risk of over-fitting becomes a key 
point in the inference of a latent factor model. Srebro et al. 
proposes a maximum margin constrain to control the struc- 
tural risk represented by matrix rank [28], which is later 
incorporating with compressed sensing to find an accurate 
completion [4]. Salakhutdinov et. al introduces the prob- 
abilistic graphical model and controls the user vectors and 
item vectors with a Gaussian prior [23, 22]. Koren controls 
the ?2-norm of user and item vectors as a regularization term 
in the loss function when fitting the observations [14, 15]. 

Due to the different advantages and disadvantages of the 
two major schools of kNN collaborative filtering and latent 
factor models, many researchers attempt to combine them 
to train a hybrid or ensemble model. A typical solution is 
incorporating one as a regularization into the other's frame- 
work [13, 29]. 

Resource projection passes user interests back and forth in 
a bipartite consisting of users and items, resulting in adap- 
tive balance between accuracy and diversity and showing 
good scalability when calculating on huge data [33]. Re- 



stricted Boltzmann machine introduces a graphical model 
with a hidden layer to train user and movie profiles in an 
efficient and scalable way [24, 21]. 

With the recent explosion of social networking services, re- 
searchers attempt to incorporate recommender systems with 
social relations to reduce the pain of sparsity and cold start 
problem, such as running the PageRank algorithm on a so- 
cial network [12, 10], analyzing social influence in recom- 
mender systems [8, 9], or training a latent factor model to 
fit observed ratings and social networks simultaneously [11, 
17, 18]. 

6. CONCLUSION 

In recommender systems, it is crucial to accurately dis- 
cover a user's interest centers, i.e., the most favorite items. 
Unfortunately, the commonly used neighborhood-based col- 
laborative filtering methods fail to do so, as they suffer from 
the so-called rating bound problem. That is to say, the rat- 
ing predicted with these methods on an item is fully bounded 
by those of observed ratings on neighboring items. As an 
interest center usually has a rating higher than the ratings 
of its observed neighbors, the aforementioned methods can- 
not accurately predict its rating at all. To overcome this 
significant problem, we formulated information recommen- 
dation as a problem of recovering a scalar rating function, 
which was further solved by optimizing the ii-norm of its 
second derivative. Through carefully designed experiments 
on three real-world datasets, namely, Douban, Goodreads 
and MovieLens, we validated the effectiveness of the pro- 
posed approach. Specially, we found that our approach can 
significantly reduce the prediction error by 37% when discov- 
ering interest centers, as compared to the well-known kNN 
collaborative filtering technique. 

In the view of this paper a scalar function is defined for 
each user independently. Encouraged by some recent re- 
search works focusing on combining the traditional user- 
based and item-based collaborative filtering techniques, it 
is interesting to explore in our framework how to relate the 
scalar functions of similar users so that they could mutually 
borrow support on unobserved items. Besides, it is an open 
question to introduce a background distribution as a prior 
to a scalar function, e.g., the global opinion or topological 
property on an item might indicate its prior probability to 
be a source node. 
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